---
title: "Ordered Beta Regression: A Parsimonious, Well-Fitting Model for Continuous Data with Lower and Upper Bounds"
bibliography: BibTexDatabase.bib
date: November 26, 2021
geometry: margin=1in
output: 
  bookdown::pdf_document2:
      keep_tex: true
      toc: false
      includes:
        in_header: preamble.tex
author: |
  | Robert Kubinec
  | New York University Abu Dhabi
abstract: "I propose a new model, ordered Beta regression, for continuous distributions with both lower and upper bounds, such as data arising from survey slider scales, visual analog scales, and dose-response relationships. This model employs the cutpoint technique popularized by ordered logit to fit a single linear model to both continuous (0,1) and degenerate [0,1] responses. The model can be estimated with or without observations at the bounds, and as such is a general solution for this type of data. Employing a Monte Carlo simulation, I show that the model is noticeably more efficient than ordinary least squares regression, zero-and-one-inflated Beta regression, re-scaled Beta regression and fractional logit while fully capturing nuances in the outcome. I apply the model to a replication of the Aidt  and Jensen (2012) study of suffrage extensions in Europe. The model can be fit with the R package `ordbetareg` to facilitate hierarchical, dynamic and multivariate modeling.^[For a reproducible version of this paper, please see the Github repository https://github.com/saudiwin/ordBetareg. To see how to fit the model with R package `ordbetareg`, which includes a full range of regression model options including multilevel/hierarchical models. I thank Adam Slez, Stijn Masschelein, Jonas Kristoffer Lindeløv, Matti Vuorre and anonymous reviewers for helpful comments.]"
---

```{r setup, include=F}

require(cmdstanr)
require(dplyr)
require(rstanarm)
require(tidyr)
require(lubridate)
require(loo)
require(kableExtra)
require(bayesplot)
require(patchwork)
require(latex2exp)
require(haven)
require(ggplot2)
require(posterior)
require(brms)

knitr::opts_chunk$set(echo=F,warning=F,message=F)

source("helper_func.R")

set.seed(665133)

zoib_model <- cmdstan_model("zoib_nophireg.stan")
ord_Beta_mod <- cmdstan_model("beta_logit.stan")
ord_Beta_mod_infl <- cmdstan_model("beta_logit_infl_simple.stan")
ord_Beta_mod_phi <- cmdstan_model("beta_logit_phireg.stan")
frac_logit <-cmdstan_model("frac_logit.stan")

# whether to fit all models
run_model <- T

# to reproduce results, change if you want fresh sampling

random_seed <- 77123101

# if you want to re-run the simulation, use this line of code
# note that the simulation can take a long time depending on cores (~2 days)

# source("ordered_Beta_reg_sim.R")

```

\newpage

Although scholars often collect data that is continuous in nature within set bounds, statistical models are not easily designed to handle this design. For example, in the social, medical and behavioral sciences there has been increasing usage of slider and visual analog scales as a way to capture nuanced information from human respondents [@cooper2014; @roster2014].

However, despite the increasing popularity of this type of data, the approaches in the statistical literature have significant limitations. While practitioners often fall back on ordinary least squares regression (OLS) as a convenient and easily interpretable way to analyze the data, the uni-modal, unbounded Normal distribution cannot capture important characteristics of the sample. More recently, scholars have turned to the Beta regression model [@ferrari2010] as a more flexible continuous distribution. Although the Beta distribution is defined over the (0,1) interval, it can be used on any continuous distribution within fixed bounds because it is straightforward to normalize any distribution to the (0,1) interval.

While this clever parameterization permits powerful inference on these bounded, potentially bi-modal distributions, the Beta regression's main flaw is that it cannot model observations that are degenerate, i.e., equal to the lower bound of 0 or the upper bound of 1. Various solutions have been proposed, such as transforming the outcome so that all observations are strictly between 0 and 1 [@verk2006], and more recently, modeling 0s and 1s through separate processes via zero-or-one inflated Beta regression [@ferrari2012] or the zero-and-one Beta regression model [@liu2015; @swearingen2012], which is also known as the ZOIB. However, as I show in this paper, transforming the outcome can inadvertently have important consequences for model results, while the ZOIB has issues with over-fitting the data by fitting multiple sets of parameters to degenerate (bounded) and continuous responses separately.

A new approach is given in this paper that seeks to capture the best points of the existing approaches while permitting straightforward and efficient inference. To do so, I employ ordered cutpoints, similar in spirit to an ordered logit model, to estimate the joint probability of 0s (the lower bound), continuous proportions, and 1s (the upper bound) in bounded continuous data. As only one predictive model is used for all of the outcomes, the effect of covariates is identified across degenerate and continuous observations without resulting in over-fitting. The use of cutpoints permits the model to fit distributions with mostly degenerate observations or no degenerate observations at all, which makes it a general solution to this problem.

To compare this new model, I estimate a Monte Carlo simulations to examine how existing approaches compare along an array of criteria. The results show that ordered Beta regression estimates marginal effects much more precisely than competitors, resulting in higher power, while performing admirably on measures of overall fit. Furthermore, the simulation shows that the alternative of transforming values to avoid boundaries in Beta regression can have difficult-to-predict implications and should be avoided.

To apply the model, I replicate the @aidtWorkersWorldUnite2014 influential study on the spread of suffrage in Europe as a function of revolutions in neighboring countries. While the proposed estimator supports the main findings of the study, there are wide discrepancies in terms of the efficiency of estimators and for some covariates, both sign and significance. I show in this section how the ZOIB can over-fit the data by allowing for too much heterogeneity between degenerate and continuous responses, resulting in marginal effects which are substantively difficult to interpret.

# Background

Bounded continuous data with observations at the bounds is common across scientific fields. While there are too many possible applications to list here, bounded scales have been employed prominently in medical pain research via VAS scales [@roster2014; @Myles2017], and slider scales more generally in psychological research [@monk1989; @Lee1991], and political science and economics [@cooper2014; @Nelson2008; @Liu2015]. Dose-response relationships often have lower and upper bounds [@prentice1976], and the analysis of chemical concentrations likewise involves bounds with continuous values [@fisher2020; @ritz2015]. As such, it is an important empirical domain for applied statistical analysis, and one that remains an active subject of research.

The default approach for modeling this type of variable is OLS regression as the variable is at least in part continuous. OLS, as the maximum entropy estimator for any continuous distribution with finite variance [@jaynes2003], is likely to capture at least some of the relevant features of the distribution. The more "Normal" the distribution, the more likely this approximation will give interpretable answers. However, @verk2006 raise important questions about this application of OLS to upper and lower-bounded dependent variables. They argue that OLS' failure to capture higher-order moments of the distribution represents a serious shortcoming because these moments, such as skewness and variance, may well affect what can be learned from the model and even the theoretical questions one can ask. One variation is to use a "quasi-likelihood" estimator called fractional logit [@papke1996] in which the trial parameter in the Bernoulli distribution is modeled as a continuous response with the logit link function. The primary drawback of fractional logit is that it is not itself a statistical distribution, and so the estimates' uncertainty can be difficult to define. As I show later, the estimator also appears to be heavily affected by the ratio of continuous to degenerate (i.e. at the bounds) responses in the outcome.

More recently, Beta regression [@ferrari2010] has become an increasingly popular technique because it is a model explicitly designed for bounded continuous data. The main drawback of Beta regression, as mentioned earlier, is that it cannot handle observations at the bounds given that it is a distribution of probabilities. There are two straightforward ways to handle degenerate values within the Beta regression framework. The first is to simply drop these responses and model the remaining data. If the count of 0s and 1s in the data is small, this strategy would seem reasonable. A more sophisticated strategy is to normalize the outcome within a set of bounds that are close to, but never equal to, one. The formula from @verk2006 that has received considerable attention from researchers is as follows. For a given outcome $y_i \in [0,1]$, $i \in \{1,2, ... N\}$, define a normalized outcome $y_j$:

$$
y_j = \frac{y_i(N-1) + 0.5}{N}
$$

The distribution of $y_i$ is nudged so that the values can never reach 0 or 1, and the amount of the nudge is determined by $N$. A larger sample will result in extreme values that are closer to the boundaries. This transformation permits Beta regression modeling without any need for further modeling choices. As such, it is a computationally simple and straightforward solution. However, it is not immediately clear what ramifications this transformation has on inferences as the transformation is non-linear and varies with sample size.

Finally, the most recent development in this vein is the zero-one inflated Beta regression (ZOIB) model, an approach that this paper builds upon. While @ferrari2012 proposed a zero-or-one inflated Beta regression, where a degenerate process could be used to model either category separately, @liu2015 show how to model both 0s and 1s in a zero-one inflated Beta regression model with two distinct processes for degenerate responses. In this paper, I focus on the @liu2015 parameterization because it is possible to estimate effects of independent variables on the full scale of the DV, i.e., lower bounds, continuous values and upper bounds. I return to the @ferrari2012 model in the discussion.

I first detail the @liu2015 parameterization as a useful starting place. It is important to note that this model and the model I present later both assume there are qualitative differences between degenerate (i.e., responses at the bounds) and continuous values of the DV. By contrast, OLS is a model that does not differentiate between the bounds and the continuous values, unless such a distinction happens via post-hoc processing [@horraceResultsBiasInconsistency2006]. However, the way in which the two Beta-based models interpret this distinction varies and has an important implication for estimation and inference.

Given a bounded continuous response $y_i$ with observations at the bounds of the scale, the ZOIB estimates three separate probabilities, which I label as $\alpha$ for $Pr(y_i=0)$, $\gamma$ for $Pr(y_i=1)$, and $\delta$ for $Pr(y_i>0 \cap y_i<1)$. Given these probabilities, we can define a conditional distribution over $y_i$ that depends on the realization of $y_i$ in these three mutually exclusive outcomes, along with parameters $\mu$ and $\phi$ to model the continuous outcomes via the Beta distribution (defined below):

```{=tex}
\begin{equation}
f(y_i|\alpha,\gamma,\delta,\mu,\phi) = \left\{\begin{array}{lr}
\alpha & \text{if } y_i=0\\
(1-\alpha)\gamma & \text{if } y_i=1\\
(1-\alpha)(1-\gamma)\text{Beta}(\mu,\phi) & \text{if } y_i \in (0,1)\\
\end{array}\right\}
(\#eq:zoib)
\end{equation}
```
where the Beta distribution is defined in as follows:

```{=tex}
\begin{equation}
f(y_i, \omega,\tau) = \frac{\Gamma(\omega + \tau)}{\Gamma(\omega)\Gamma(\tau)}y_i^{\omega-1}(1-y_i)^{\tau-1} 
(\#eq:Beta)
\end{equation}
```
To directly estimate the mean of the Beta distribution, we can substitute the parameters $\mu$ and $\phi$ (precision) for the shape parameters $\omega$ and $\tau$:

```{=tex}
\begin{align}
\mu = \frac{\omega}{\omega+\tau}\\
\phi = \omega + \tau
\end{align}
```
Returning to the ZOIB model in \@ref(eq:zoib), we can see that the Beta distribution is being deflated by the probabilities $\alpha$ and $\gamma$ such that the contribution of the Beta distribution cannot exceed $(1-\alpha)(1-\gamma)$, i.e., $\delta$. To parameterize the model, we can include regressors $\alpha = g(X'\beta_\alpha)$, $\gamma = g(X'\beta_\gamma)$, and $\mu=g(X'\beta_\delta)$ for a given matrix of covariates $X$. These linear models are rescaled with the inverse logit function $g(\cdot)$ to map on to $(0,1)$. While the covariates $X$ for each of the sub-models could be different or shared, the parameters $\beta_\alpha$, $\beta_\gamma$, and $\beta_\delta$ need to be distinct for each sub-model as the three processes are functionally independent. To make the estimates comparable to OLS and ordered beta regression, I only consider a version of the model with identical covariates in all three sub-models.

It is important to note that the ZOIB sub-models are not ordered, i.e., they are exchangeable. While this point is rather subtle, it is very important for the modeling exercise that follows. It is possible in this model for $Pr(y_i=0)$ and $Pr(y_i=1)$ to both increase independently of $Pr(y_i>0 \cap y_i<1)$. This independence can isolate heterogeneity in either end of the slider scale such that the decision to choose a 1 or a 0 are distinct choices with no necessary connection to each other, which is similar in principle to a selection model like a hurdle [@zorn1998].

This exchangeability, however, comes at a cost of over-fitting the distribution if the underlying processes are in fact generated by the same process. As the number of covariates $X$ increases, the number of parameters necessarily triples (assuming that all covariates are used to predict all parts of the model). The independence between sub models means that $X$ could positively predict $Pr(y_i=0)$ and negatively predict $Pr(y_i=1)$ in the same model without contradiction. If the degenerate outcomes represent a fundamental different process versus the continuous outcomes, than modeling this heterogeneity is necessary, but otherwise it introduces a substantial additional level of complexity.

<!--# It is not necessary, of course, to have identical parameters for each component in the ZOIB. Model complexity could be reduced if only a subset of the parameters were used for the models at the bounds, i.e. $Pr(y_i=0)$ and $Pr(y_i=1)$. However, doing so would not permit an interpretation of the effect of a predictor over the full combined distribution, which is the research question this paper focuses on. For these reasons, I only consider a ZOIB in which all parameters appear in all parameterizations.   -->

<!-- As such, the model seems to go beyond what many practitioners want from the model, which is the ability to evaluate the ability of covariates of interest to affect the combined distribution of $y_i$. The functional independence of outcomes, while a potentially powerful way to examine in depth the processes that lead to degenerate outcomes, is not necessarily the object of interest to scholars who collect this data. The probabilities of the three categories do not need in general  need to be independent since the same set of respondents in a survey or treatment group are all employing the same scale to produce the outcome in question. -->

# Model

To resolve the problem of over-parameterization in the ZOIB, I present *ordered* Beta regression. The main difference between this approach and the ZOIB model is to induce dependence between the three probabilities $\alpha$, $\gamma$ and $\delta$. To do so, I borrow ideas from the literature on the ordered (cumulative) logit model [@ologit1980]. 

To  give the intuition behind the model, I first define it in terms of measuring individual preferences. Suppose data is being collected on the preferences of individual $i$ over an outcome that is continuous in $\mathbb{R}^1$. We can denote these latent preferences as $y_i^*$. We collected a set of measurements $y_i$ for each individual in our sample concerning their preferences. However, because we cannot represent $\mathbb{R}^1$ in a limited space, we can only collect data using a bounded scale, resulting in a biased estimate of true preferences $y_i^*$ for those with preferences above or below the bounds of the scale.

While we cannot directly observe $y_i^*$, we can obtain estimates of our uncertainty in using a bounded scale as a stand-in for true preferences. We can think of the distribution of our measurements $y_i$ as being a realization of a latent cumulative distribution function $F(y_i^*)$ defined by a set of ordered cutpoints $\{k1,k2: k1<k2\}$. Each cutpoint represents the point at which the bounds of the observed scale become more likely than the continuous values. The farther past the cutpoint an individual's preferences are, the more likely that individual will provide a measured value of $y_i$ at the bound. Providing estimated locations for these cutpoints in $F(y_i^*)$ will permit us to also estimate the approximate intensity of preferences that are lost in the data measurement process, and consequently correct for bias in the reported data.

Importantly, because we are interested in a single latent scale of preferences $y_i^*$, we can also have a single set of regressors $X'\beta$ that predict this latent variable, $y^*_i = g(X'\beta)$. At very low values of $y_i^*$ below $k_1$, we observe a degenerate outcome $y_i=0$, and at very high values of $y_i^*$ above $k_2$ we observe a degenerate outcome $y=1$. For intermediate values of $y_i^*$, we observe the Beta-distributed outcome $y_i \in (0,1)$, which we can assume is unbiased over its range (i.e., strictly within the bounds).

Using $y_i^*$ and cutpoints $k_1$ and $k_2$, we can now re-define the probabilities $\alpha$,$\gamma$ and $\delta$ where:

```{=tex}
\begin{equation}
\left\{\begin{array}{lr}
\alpha  = 1 - g(X'\beta - k_1)\\
\delta  = \left[g(X'\beta - k_1) - g(X'\beta - k_2) \right ] \text{Beta}(g(X'\beta),\phi)\\
\gamma = g(X'\beta - k_2)\\
\end{array}\right\}
(\#eq:redef)
\end{equation}
```
We can see that we can still obtain the necessary probabilities $\alpha$ ($Pr(y_i=0)$), $\gamma$ ($Pr(y_i=1)$) and $\delta$ ($Pr(y_i>0 \cap y_i<1)$) to combine the 0s, 1s, and proportions into a single distribution. However, unlike in \@ref(eq:zoib), the probabilities are no longer exchangeable, but rather ordered due to the cutpoints. The position of the cutpoints $k_1$ and $k_2$ will affect each outcome by either decreasing or increasing the probability of that outcome occurring. As $k_1$ increases, $Pr(y_i=0)$ must increase and $Pr(y_i>0 \cap y_i<1)$ must decrease. Similarly, an increase in $k_2$ will necessarily make $Pr(y_i=1)$ decrease and increase $Pr(y_i>0 \cap y_i<1)$.

In terms of latent utility, the position of the cutpoints defines the potential loss of information about an individual. If the cutpoints increase, then there would seem to be more bunching around the end of the observed scale and consequently only higher levels of $y_i^*$ are associated with observations at the bounds. As the cutpoints decrease towards zero, there is no clear break in the distribution around the bounds and consequently little reason to expect significant bias. As can be seen in \@ref(eq:redef), if both $k_1=0$ and $k_2=0$, then the probability of $\delta$, or a continuous observation, is simply equal to $\text{Beta}(g(X'\beta),\phi)$, i.e. there is no need to adjust the observed data and the probability of a degenerate outcome increases in tandem with the continuous outcomes.

This is another important distinction between the ordered beta regression and the ZOIB model--both treat responses at the bounds as qualitatively different than the continuous responses in some sense, but the ordered beta regression allows the extent of difference to be a smooth function of the data. As I describe in the Discussion section, the ZOIB can be more helpfully thought of as a type of selection model in which cases can select into and out of continuous responses. In this framework, a degenerate response of 0 is equally distinct from all continuous  responses regardless of their proximity to 0. By contrast, the ordered beta model is more appropriate when the outcome is a single, possibly latent, scale, and the bounds of the scale are qualitatively different from continuous responses mainly in terms of strength or intensity, not type.

The implication of using ordered cutpoints means that only two parameters, the cutpoints themselves, are required in addition to the auxiliary parameter $\phi$. Thus only two parameters more than OLS are required to fit this model, improving considerably efficiency and information retrieval relative to the ZOIB, as will be shown in the simulations. It is possible to expand the model by permitting thresholds to vary by groups, as in the graded response model [@samejima1997], although I do not consider those extensions here. This type of heterogeneity could improve model fit, but sample-average cutpoints are appropriate when subgroup-level differences are not the primary aim of the analysis.

To consider the model from a Bayesian perspective, I assign weakly informative priors to the parameters. I define these as:

```{=tex}
\begin{align}
\beta &\sim N(0,5)\\
\phi &\sim E(.1)\\
s(\mathbf{K},0) &\sim D(\mathbf{1})
(\#eq:prior)
\end{align}
```
I use what is known as an induced Dirichlet prior over the set of cutpoints $K$ in which the cutpoints are converted to a vector of probabilities via a sigmoid function $s(\cdot)$ [@betancourtOrdinalRegression2019]. A vector of 1s is passed to the Dirichlet distribution to assume that the probability of each sub-model, 0, 1 and the Beta model, is equal a priori. Similarly, the prior on $\beta$ is weakly informative on the logit scale. The prior on $\phi$ is an exponential distribution with a rate parameter of .1. This prior would need to be adjusted if $\phi$ took on very large values, which could occur if the distribution became highly centered on a single point.

More informative priors for the cutpoints could be useful in a repeated sampling situation in which observations at the bounds may or may not be present in a given sample. An informative cutpoint prior could stabilize inferences across samples by informing model estimates that observations at the bounds are likely to occur even if they are not in the sample at hand.

<!-- The exponential prior on $\phi$ similarly puts prior mass on a wide range of values due to its long tail. The priors also permits the model to fit distributions that do not have observed degenerate values, such as in a repeated sampling situation where degenerate values may or may not appear in a given sample. -->

It is possible to further parameterize $\phi$ to model higher moments in the distribution. A lower value of $\phi$ for a given value of $\mu$ is associated with a more dispersed distribution, either 0, 1 or both depending on the value of $\mu$. This kind of information can be useful to analyze in a context where understanding which subjects tend to choose middle versus extreme values is a research question of interest. To do so we simply replace $\phi$ with a set of regressors $\beta_\phi$ and a covariate matrix $X$ (the covariates could be shared or different from those used to predict the mean of the distribution). The ability to model $\phi$ is an important advantage of employing the Beta distribution as it permits inferences on the nature of multi-modal distributions in ways that OLS and other alternatives cannot.

In the supplementary information I write out the full log-likelihood and joint posterior of the model.

# Estimation

Estimation of the model is done using Hamiltonian Markov Chain Monte Carlo with the software Stan [@CarpenterGelmanHoffmanEtAl2017]. The model converges fairly rapidly with less than 1,000 iterations on simulated data. In addition to sampling the model above, I also draw from the posterior-predictive distribution of $y_i$, denoted $\int_\Theta p(\tilde{y_i}|\theta)p(\theta|y_i)\text{d}\theta$, conditional on the posterior estimate of the model parameters (denoted $\theta$) for a given number of MCMC draws $S$. To do so I first sample a categorical outcome $y_{repO} \in \{1,2,3\}$ based on an ordered categorical tuple of the probabilities $\alpha$, $\gamma$ and $\delta$:

```{=tex}
\begin{equation}
y_{repO}^s \sim \text{Cat}(\{1,2,3\},\{\alpha^s,\delta^s,\gamma^s\})
(\#eq:yrepo)
\end{equation}
```
If $y_{repO}^s$ is equal to 1 or 3, then assign 0 and 1 respectively to $y_{rep}^s$:

```{=tex}
\begin{align}
y_{rep}^s = 0& \text{ if } y_{repO} = 1\\
y_{rep}^s = 1& \text{ if } y_{repO} = 3
\end{align}
```
I then draw from the Beta distribution if $y_{repO}=2$.

```{=tex}
\begin{equation}
y_{rep}^s \sim \text{Beta}(\mu_s,\phi_s) 
\end{equation}
```
In addition to this predictive distribution, I also examine measures of model fit. I use an estimate of leave-one-out (LOO) predictive density in which the posterior predictive distribution is evaluated by dropping a data point $y_i$, estimating the model, and predicting the held out $y_i$. Given that this measure is computationally challenging with Bayesian inference, I employ an approximation from @vehtari2016, the Pareto-stabilized important sampling (PSIS)-LOO predictive density:

```{=tex}
\begin{equation}
\hat{elpd}_{psis_loo} = \sum_{i=1}^N \text{log } \left( \frac{\sum_{s=1}^S \omega_i^s p(y_i|\theta^s)}{\sum_{s=1}^S \omega_i^s} \right)
(\#eq:psis)
\end{equation}
```
The $\omega_i$ are weights derived from importance sampling of the joint posterior for each data point $y_i$ and smoothed by the Pareto distribution to account for outliers. The resulting quantity can be interpreted as the log density of a future dataset $\tilde{y}_i$ from the "true" data-generating process. Higher values of the metric indicate a better fit to the DV. Importantly, this quantity can be evaluated on any of the models discussed so long as the same data are used to fit the model. In addition, this calculation yields an estimate of the effective number of parameters in each model, which is an indicator of relative model complexity.

Finally, I also estimate sample average marginal effects for each parameter $c$ in $\beta$ on the expected value of $y_i$. I use sample average marginal effects because it is a relatively model-neutral way of understanding the effect of coefficients on the response. For example, the ZOIB produces three sets of coefficients for each predictor, but only one sample average marginal effect per predictor. I evaluate these marginal effects through numerical differentiation of $\frac{\partial E(y_i|\beta_{-c},K)}{\partial \beta_c}$, iterating over all elements $c$ in $\beta$ [@leeper2017]. I suppress $\phi$ in the notation because it does not by definition factor into the calculation of the expected value.

I can then evaluate inferential properties of the different models in terms of the true marginal effect versus the estimated marginal effects. I calculate M-errors for the magnitude of bias and S-errors for the proportion of draws where the model estimates the wrong sign of the marginal effect [@gelman2014types]. I calculate coverage rates as the proportion of draws in which the model's 5% to 95% posterior interval includes the true marginal effect.

# Simulations

To compare the models, I simulate data in a manner consistent with the distribution by using the formula described above. Because the results of simulations can be sensitive to the particular values of parameters, I draw from a broad range of possible values to simulate the ordered Beta regression model. For a given number of covariates $q$, level of correlation between covariates $\rho_q$, vector of covariate effects $\mathbf{\beta_q}$, , scalar precision parameter $\phi$, and cutpoints $k \in \{1,2\}$, these ranges are defined as:

```{=tex}
\begin{align}
\phi &\sim U(0.5,30)\\
q &\sim U(1,15)\\
\rho_q &\sim U(0,0.9)\\
\mathbf{\beta_q} &\sim U(-5,5)\\
k_1 &\sim U(-10,-1)\\
k_2 &\sim k_1 + U(0.5,10)
(\#eq:simval)
\end{align}
```
The total number of observations $N$ is also sampled from a uniform distribution between 100 and 4000. For this simulation, 10,000 independent variates were drawn from the distributions above. Because the broad bounds on the parameters permit relatively sparse distributions, such as no zeros or ones (or no continuous values), any set of parameter values that could not produce at least five observations from the zeroes, ones and continuous parts of the distribution was discarded as the ZOIB is difficult to estimate with that level of sparsity. I also report a simulation in the appendix in which the values of these parameters are fixed and I vary sample size $N$ to permit more precise inference at specific values.

In addition to the ordered Beta regression model specified above, four other models were fit to the data. Two kinds of Beta regression models were fit, one using the transformation specified in @verk2006 and the second only using the continuous values of the distribution (discarding degenerate outcomes). I also included the ZOIB model specified above and a Bayesian version of OLS. Finally, I added a Bayesian version of the fractional logit model using the @papke1996 parameterization.

All of the estimated values came from Hamiltonian Markov Chain Monte Carlo chains using Stan [@CarpenterGelmanHoffmanEtAl2017] to permit comparability. All models are run with two chains of 1,000 iterations each with 500 samples discarded as warmup in each chain. The use of the same sampler ensures that the results are comparable across models. The Beta regression models and the OLS regression were fit with `brms` [@bürkner2017].

The simulation results are reported in Table \@ref(tab:comparebase) by model and summary statistic calculated from the Monte Carlo draws. As can be seen, the ordered Beta regression has the highest coverage, as would be expected. The ZOIB model has a marginally lower coverage rate, while OLS and fractional logit are both approximately 10% lower. The two types of Beta regressions have quite low coverage rates, below 50% of the simulation draws.

The reported M-errors [@gelman2014types] in Table \@ref(tab:comparebase) are defined as the ratio of the estimated marginal effect to the true marginal effect. This statistic can capture bias across the distributions, as a completely un-biased estimator would equal exactly 1. As can be seen, in finite samples this is not so, although the ordered Beta regression model approaches 1. While the ZOIB, OLS and fractional logit all have noticeable bias, bias is much worse for the Beta regression models. Similarly, S-errors, which are defined as the probability of estimating the wrong sign relative to the true marginal effect, are three to four times more common in other models relative to ordered beta regression.

In terms of model complexity, the effective number of parameters of ordered beta regression is quite similar to OLS, but half that of the ZOIB. Interestingly, fractional logit and the beta regressions have fewer effective parameters than ordered beta regression, implying these models are actually under-specified. Higher $\hat{elpd}_{PSIS-LOO}$ values likewise correctly select ordered beta regression as the "true" model and the ZOIB as a second competitor, which would be expected given the similar nature of these models.

However, the closeness in $\hat{elpd}_{PSIS-LOO}$ values only measures similar fit to the response variable rather than inference. The ZOIB and ordered beta regression differ significantly in the amount of uncertainty in estimating the marginal effect, Table \@ref(tab:comparebase) shows that the average variance of the posterior density for ordered beta regression sample average marginal effects is far less than the ZOIB and other alternatives.

I further explore differences between models by examining these statistics as a function of sample size $N$ in Figure \@ref(fig:lookurt). I also include power as a separate calculation, i.e. the probability of detecting an effect different from zero (defined as a posterior interval with the same sign as the marginal effect). As can be seen, ordered beta regression out-performs on these key inferential metrics, and the difference is more pronounced when sample sizes are less than 1,000. Above 1,000, differences are smaller although they still persist. Interestingly, the simulation does not show that M-errors decrease noticeably with larger samples.

<!-- Immediately below the M-error panel is the S-error panel, which is defined as the proportion of estimated marginal effects that have a different sign than the true marginal effect [@gelman2014types]. Sign errors are relatively unlikely and the models do not seem to diverge significantly on this statistic, possibly because there is not enough power to separate the models' performance. -->

<!--# The bottom right panel shows root mean square error (RMSE). In this metric, ordered Beta regression marginally out-performs competitors, with fractional logit a close second. The Beta regression on continuous values is artificially lower on this metric because extreme values were excluded, and the transformed Beta regression shows poorer performance because RMSE was evaluated on the original (untransformed) outcome. Somewhat surprisingly, OLS shows a higher RMSE than other models, possibly because it can predict outside of the $[0,1]$ interval.  -->

<!--# The middle and lower-left panels in the figure show results for $\hat{elpd}_{PSIS-LOO}$ (the statistic cannot be calculated for the Beta regressions because the outcome must be the same across models). Both ordered Beta regression and the ZOIB model have high $\hat{elpd}_{PSIS-LOO}$ scores, although ordered beta regression is the highest overall and also is the most likely to be ranked as having the highest predictive validity among alternatives. The wide disparity in average $\hat{elpd}_{PSIS-LOO}$ is due to the fact that sometimes fractional logit and OLS spectacularly fail to capture the distribution. Figure @ref(fig:lookurt) examines average $\hat{elpd}_{PSIS-LOO}$ values by the kurtosis of the simulated distribution. As can be seen, when kurtosis is very low, which would correspond to a functionally bi-modal distributions, OLS fits the data very poorly, but when kurtosis increases such that the distribution is unimodal, OLS' predictive validity converges to that of the ordered beta.  -->

```{r comparebase,fig.cap="Average Simulation Performance Results",fig.width=6,fig.height=8}

require(forcats)

#all_sim <- readRDS("data/sim_cont_X.rds")
# compatability with Dataverse
load("sim_cont_X.RData")
checkc <- sapply(all_sim, class)
all_sim <- all_sim[checkc!='try-error']
all_sim <- bind_rows(all_sim) %>% 
  unchop(c("med_est","X_beta","marg_eff","high","low","var_calc",
           'marg_eff_est','high_marg','low_marg','var_marg'))

my_conf_fun <- function(x) {
  if(all(x>=0 & x<=1)) {
    # use binomial confidence intervals
    bi_ci <- binom::binom.bayes(sum(x,na.rm=T),length(x))
    return(list(y=bi_ci$mean,
                  ymin=bi_ci$lower,
                  ymax=bi_ci$upper))
  } else {
    boot_ci <- boot::boot(x,statistic=function(x,i) median(x[i]),R=1000)
    boot_cov <- boot::boot.ci(boot_ci, type="basic")
    return(list(y=boot_ci$t0,
                  ymin=boot_cov$basic[1,4],
                  ymax=boot_cov$basic[1,5]))
  }
}


# need to remove an outlier (rmse shouldn't be greater than 1)

#all_sim <- filter(all_sim,rmse<1)

all_sim_calc <- all_sim %>% 
  mutate(s_err=sign(marg_eff)!=sign(marg_eff_est),
            m_err=abs(marg_eff_est)/abs(marg_eff),
         cov=ifelse(marg_eff>0,marg_eff<high_marg & marg_eff>low_marg,
                    marg_eff<high_marg & marg_eff>low_marg),
         loo_val=ifelse(model %in% c("Beta Regression - (0,1)",
                                     "Beta Regression - Transformed"),
                        NA,loo_val),
         model=recode(model, `Ordinal Beta Regression`="Ordered Beta Regression")) %>% 
  group_by(N,marg_eff) %>% 
  mutate(var_marg=var_marg/var_marg[model=="Ordered Beta Regression"]) %>% 
  ungroup %>% 
  select(RMSE="rmse",`ELPD LOO`="loo_val",
         `Proportion S Errors`="s_err",
         `M Errors`="m_err",`Variance (% Ordered)`='var_marg',
         `5% - 95% Coverage`="cov",`No. Parameters`="p_loo",model) %>% 
  gather(key = "type",value="estimate",-model) %>% 
  filter(!is.na(model)) %>% 
  group_by(model,type) %>% 
  summarize(boot_all=my_conf_fun(estimate[!is.na(estimate)]),
            lower=boot_all$ymin,
            med_est=boot_all$y,
            upper=boot_all$ymax) %>% 
  select(-boot_all) %>% 
  distinct %>% 
  ungroup

  all_sim_calc %>% 
  mutate_at(c("lower",'upper','med_est'),~signif(.,digits=3)) %>% 
  mutate_at(c("lower",'upper','med_est'),~ifelse(type %in% c("Proportion S Errors",
                                                             "5% - 95% Coverage",
                                                             "Variance (% Ordered)"), 
                                                 paste0(.*100,"%"),.)) %>%
  mutate(model=factor(model),
         model=fct_relevel(model,"Ordered Beta Regression","ZOIB"),
         model=recode(model, `Ordered Beta Regression`="Ordered Beta Regression")) %>% 
    filter(!(model %in% c("Beta Regression - (0,1)",
                                     "Beta Regression - Transformed") & type=="ELPD LOO")) %>% 
  arrange(type, model) %>% 
  select(Statistic="type",
         Model="model",
         `5%`="lower",
         `Median`="med_est",
         `95%`="upper") %>% 
  kable(caption="Comparison of Simulation Diagnostics",
        label = "comparebase",
        booktabs=T,longtable=T) %>% 
 kable_styling(latex_options = c("hold_position","repeat_header"),
               font_size=8) %>% 
  collapse_rows(columns=1,latex_hline = "major",
      valign="top")
  
  #   mutate(hline=case_when(type=="M Errors"~1,
  #                        type=="5% - 95% Coverage"~.90,
  #                        TRUE~NA_real_)) %>% 
  # ggplot(aes(y=estimate,x=model)) +
  # stat_summary(fun.data=my_conf_fun,geom="pointrange", na.rm=TRUE) +
  # facet_wrap(~type,scales="free_x",dir="v") +
  # theme_minimal() +
  # geom_hline(aes(yintercept=hline),linetype=2, na.rm=TRUE) +
  # ylab("") +
  # xlab("") +
  # theme(panel.grid=element_blank(),
  #       panel.background = element_rect(fill = NA, color = "gray")) +
  # coord_flip()

```

The fixed simulation results, which are available in the supplemental information, show that for certain values of parameters, it is possible for the ZOIB to have lower variance in estimated sample average marginal effects than ordered beta regression. However, the decrease in variance is small while the increase in bias via M-errors is quite large: in the fixed simulation, the ZOIB estimates coefficients with less than half the magnitude of the true marginal effect.

As a result, even without looking at empirical examples, it is difficult to recommend OLS, fractional logit or transformed continuous outcomes as a general approach for analyzing this type of data. Fractional logit and OLS perform reasonably well at recovering marginal effects, but they have noticeably lower coverage rates. Transformation of the outcome to the $[0,1]$ interval is also clearly not a harmless strategy as it dramatically changes the estimated marginal effects. Discarding degenerate outcomes can likewise affect parameter estimates even though the continuous outcomes are predicted by the same linear model as the degenerate outcomes. All of the alternatives estimate marginal effects with substantially more uncertainty than the ordered beta model.

```{r lookurt,fig.cap="Bias in Estimates As a Function of Sample Size", message=F}

require(ggthemes)

all_sim %>%
  mutate(s_err=sign(marg_eff)!=sign(marg_eff_est),
            m_err=abs(marg_eff_est)/abs(marg_eff)) %>% 
  mutate(Power=as.numeric(ifelse(sign(marg_eff)==sign(high) & sign(marg_eff)==sign(low),
                                 1,
                                 0)),
         model=recode(model, `Ordinal Beta Regression`="Ordered Beta Regression")) %>% 
  select(`Proportion S Errors`="s_err",N,Power,
         `M Errors`="m_err",Variance="var_marg",model) %>% 
  gather(key = "type",value="estimate",-model,-N) %>% 
  #filter(model %in% c("OLS","Ordered Beta Regression",'ZOIB',"Fractional")) %>% 
  mutate(model=factor(model, 
                      levels=c("Ordered Beta Regression",'ZOIB',"OLS","Fractional"))) %>% 
    filter(!is.na(model)) %>% 
  ggplot(aes(y=estimate,x=N)) +
  #geom_point(aes(colour=model),alpha=0.1) +
  stat_smooth(aes(colour=model,linetype=model),method="gam",formula = y ~ s(x, bs = "cs")) + 
  ylab("") +
  xlab("N") +
  scale_color_viridis_d() +
  facet_wrap(~type,scales="free_y",ncol = 2) +
  labs(caption=stringr::str_wrap("Summary smooth calculated via generalized additive model. M Errors  and S errors are magnitude of bias and incorrect sign of the estimated marginal effect. Variance refers to estimated posterior variance (uncertainty) of the marginal effect.",width=80)) +
  guides(color=guide_legend(title=""),
         linetype=guide_legend(title="")) +
  theme_tufte()

```

```{=latex}
\begin{figure}[t]
```
```{r varN,echo=F}

require(ggthemes)
require(ggtext)

all_sim %>%
  filter(model=='ZOIB') %>% 
  select(Variance="var_marg",N,k,rho) %>% 
  ggplot(aes(y=Variance,x=rho)) +
  stat_smooth(aes(colour=k,group=k),se = F,alpha=.5,size=.5) +
  labs(x=TeX("$\\rho_x$"),y=TeX("$V\\[\\beta_x\\]$")) +
  scale_color_distiller(type="seq",palette="Reds",
                        direction=1,
                        name="No.\nCovariates") +
  guides(linetype=guide_legend(title="")) +
  theme_tufte()

```

```{=latex}
\caption{Bias in ZOIB Estimates As a Function of Number and Correlation of Covariates. \label{fig:pressure}}
\textit{Notes:} Summary smooth calculated via generalized additive model. Variance of $\beta_x$ refers to estimated posterior variance (uncertainty) of the marginal effect. Variance calculated conditional on simulated covariate correlation $\rho_x$.
\end{figure}
```
It is clear that even though ordered beta regression can be understood as a special case of the ZOIB, the simplified parameterization has real consequences for inference. As is seen in the statistics in Figure \@ref(fig:lookurt), the ZOIB can recover marginal effects and has high $\hat{elpd}_{PSIS-LOO}$ scores, but its model complexity is far higher and its marginal effects are substantially more uncertain. To make the distinction clearer, I show variance estimates of marginal effects from the ZOIB conditional on the correlation ($\rho_x$) and number of covariates in Figure \@ref(fig:pressure). As can be seen, variance increases considerably when the number of covariates and the correlation between covariates is high.

To distinguish the models in an applied setting, I next turn to an empirical example.

# Empirical Examples

To apply the model, I examine data from the @aidtWorkersWorldUnite2014 study of the extension of suffrage in Europe as a function of the geographical spread of democratic revolutions in neighboring countries. Their measure of suffrage is a 0 to 100 index, and they employ OLS as a primary model with fractional logit as a robustness check. I re-estimate one of their main panel data (country-year) specifications (model 5, Table 2) with ordered beta regression, transformed Beta regression and the ZOIB.

Because of the inclusion of a lagged dependent variable with limited over-time variation, I used the QR matrix decomposition to avoid poor sampling due to high posterior correlation [@stan2016]. I also do not employ the paper's standard error corrections as these do not have clear analogues in Bayesian inference, though the estimates for OLS are still quite similar to those in the original paper.

```{r loaddata,fig.cap="Suffrage Index from Aidt and Jensen (2012) with Estimated Cutpoint Locations"}

  require(readr)
  require(forcats)
  require(haven)

  model_data_long <- read_dta("EER-D-13-00718R2_maindata_suffrage.dta") %>% gather(key="year",value="dummy",yy1:yy57) %>% 
    mutate(year=factor(year),
           ccode=factor(ccode),
           year=fct_relevel(year,"yy1"),
           country=fct_relevel(country, "UK"))
  
  model_data_long <- mutate(model_data_long,
                            therm=(e2c - min(e2c,na.rm=T))/(max(e2c,na.rm=T) - min(e2c,na.rm=T)),
                            lage2c=(lage2c - min(e2c,na.rm=T))/(max(e2c,na.rm=T) - min(e2c,na.rm=T)),
                            therm_type=case_when(therm>0  & therm<1 ~ "inside",therm==0|therm==1~"outside",TRUE~'other'),
                            therm_rescale=(therm * (sum(!is.na(therm))-1) + 0.5)/sum(!is.na(therm)))
  
    model_spec <- therm ~ lage2c + revolution4b + laggdp + lagn + 
      year + lagurban + war + ww1 + lagenrollment + Gold_standard + country
  
    x_orig <- model.matrix(model_spec,
                    data=model_data_long)[,-1]
    
    model_data <- slice(model_data_long,as.numeric(row.names(x_orig)))
    
    # need to de-correlate x
    
    x_q <- qr(x_orig)
    x <- qr.Q(x_q)
    x_r <- qr.R(x_q)
    x_inv <- solve(x_r)

  model_data_prop <- filter(model_data,therm_type=="inside")
  model_data_degen <- filter(model_data,therm_type=="outside")
  
  # don't drop the intercept for the inflation model
  # X_prop_miss <- model.matrix(therm~pre_race + pre_sex + 
  #                               pre_age + pre_fed_response_covid + pre_party_id,
  #                             data=model_data_prop)
  # X_degen_miss <- model.matrix(therm~pre_race + pre_sex + 
  #                               pre_age + pre_fed_response_covid + pre_party_id,
  #                              data=model_data_degen)
  # X_prop_phi <- model.matrix(therm~pre_race + pre_sex + 
  #                               pre_age + pre_fed_response_covid + pre_party_id,
  #                            data=model_data_prop)
  # X_degen_phi <- model.matrix(therm~pre_race + pre_sex + 
  #                               pre_age + pre_fed_response_covid + pre_party_id,
  #                             data=model_data_degen)
  
  
  if(run_model) {
    sample_all <- sample(1:nrow(x),size=floor(nrow(x)/10))
    saveRDS(sample_all,"sample_all.rds")
  } else {
    sample_all <- readRDS("sample_all.rds")
  }
  
  
  X_degen <- x[model_data$therm_type=="outside",]
  X_prop <- x[model_data$therm_type=="inside",]
  
  # need samples for outcomes
  
  sample_prop <- (1:nrow(X_prop))[which(model_data$therm_type =="inside") %in% sample_all]
  sample_degen <- (1:nrow(X_degen))[which(model_data$therm_type =="outside") %in% sample_all]

  to_bl <- list(N_degen=nrow(model_data_degen),
                N_prop=nrow(model_data_prop),
                X=ncol(X_prop),
                X_miss=0,
                infl_value=-1,
                outcome_prop=model_data_prop$therm,
                outcome_degen=model_data_degen$therm,
                covar_prop=X_prop,
                covar_degen=X_degen,
                covar_prop_infl=array(0,dim=c(nrow(model_data_prop),0)),
                covar_degen_infl=array(0,dim=c(nrow(model_data_degen),0)),
                N_pred_degen=length(sample_degen),
                N_pred_prop=length(sample_prop),
                indices_degen=sample_degen,
                indices_prop=sample_prop,
                run_gen=1)
  
  if(run_model) {
    fit_pew <- ord_Beta_mod$sample(seed=random_seed,
                        data=to_bl,parallel_chains=2,cores=2,
                        iter_warmup=500,
                        iter_sampling=500,
                        refresh=100)
  
    fit_pew$save_object("fit_pew.rds")
  } else {
    fit_pew <- readRDS("fit_pew.rds")
  }

tibble(therm=model_data$therm*100) %>% 
  ggplot(aes(x=therm)) +
  geom_histogram(bins=100) +
  theme_minimal() + 
  theme(panel.grid=element_blank()) +
  scale_x_continuous(breaks=c(0,25,50,75,100),
                     labels=c("0","Less Suffrage",
                              "50","More Suffrage","100")) +
  ylab("") +
  geom_vline(xintercept=mean(plogis(fit_pew$draws("cutpoints[1]")))*100,
             linetype=2) +
  geom_vline(xintercept=mean(plogis(fit_pew$draws("cutpoints[2]")))*100,
             linetype=2) +
  xlab("") +
  labs(caption=paste0("Figure shows the distribution of ",sum(!is.na(model_data$therm))," country-year suffrage index values.\nThe two vertical lines indicate the estimated cut points from an ordered Beta regression model\n where responses have a 0.5 probability of being considered degenerate (i.e. close to 0 or 1)."))

#ggsave("college_prof.png",width=5,height=3,units="in",scale=1.1)

```

A histogram of the dependent variable (normalized to [0,1]) is shown in Figure \@ref(fig:loaddata). The figure reveals significant bunching around the lower end of the scale, and more modest bunching at the upper end of the scale. I plot the estimated location of the cutpoints from an ordered Beta regression fit as vertical dashed lines over the histogram, showing that there is substantial reason to believe that the end points are qualitatively different from the continuous responses. As such, it is potentially a useful empirical application for the ordered beta regression model, as underlying the scale is a single-dimensional latent concept--inclusion in the electoral system--but also reason to believe that moving from a suffrage value of 0 to 0.1 (at least some inclusion) entails larger changes in the latent variable than a transition from 0.1 to 0.2 or 0.5 to 0.6.

```{r fitordreg,include=F}
  
# calculate marginal effects
  
  cutpoints_est <- as_draws_matrix(fit_pew$draws("cutpoints"))
  X_beta_ord <- as_draws_matrix(fit_pew$draws("X_beta"))
  yrep_ord <- as_draws_matrix(fit_pew$draws("regen_epred"))
  alpha_ord <- as_draws_matrix(fit_pew$draws("alpha"))
  
  X_beta_ord <- apply(X_beta_ord, 1, function(c) {
        x_inv %*% c
      })
  X_beta_ord <- t(X_beta_ord)
  
# iterate over columns to get marginal effects
# first for phi reg

mat_data <- rbind(x_orig[model_data$therm_type=="outside",],
                  x_orig[model_data$therm_type=="inside",])

# don't iterate over fixed effects

just_vars <- (1:ncol(mat_data))[!grepl(x=colnames(x_orig),pattern="year|country")]
  
# iterate over columns to get marginal effects for regression without phi reg
  
#mat_data_miss <- rbind(X_degen_miss,X_prop_miss)

all_vars_ord <- lapply(just_vars,function(c) {

  if(all(mat_data[!is.na(mat_data[,c]),c] %in% c(0,1))) {
    pred_data_high <- mat_data

    pred_data_high[,c] <- 0

    pred_data_low <- mat_data

    pred_data_low[,c] <- 1
  } else {
    pred_data_high <- mat_data

    pred_data_high[,c] <- pred_data_high[,c] + setstep(pred_data_high[,c])

    pred_data_low <- mat_data

    pred_data_low[,c] <- pred_data_low[,c] - setstep(pred_data_low[,c])
  }

  margin_ord <- sapply(1:nrow(X_beta_ord), function(i,this_col) {
    y0 <- predict_ordbeta(cutpoints=cutpoints_est[i,],
                          X=pred_data_low,
                          alpha=as.numeric(alpha_ord)[i],
                          X_beta=t(X_beta_ord[i,,drop=F]))

    y1 <- predict_ordbeta(cutpoints=cutpoints_est[i,],
                          X=pred_data_high,
                          alpha=as.numeric(alpha_ord)[i],
                          X_beta=t(X_beta_ord[i,,drop=F]))

    marg_eff <- (y1-y0)/(pred_data_high[,this_col]-pred_data_low[,this_col])

    mean(marg_eff)
  },c)

  tibble(marg=margin_ord,variable=colnames(mat_data)[c])
}) %>% bind_rows

```

```{r fitlm,include=F}

# Fit OLS
if(run_model) {
  pew_fit_lm <- brm(bf(model_spec,decomp="QR"),
                    data=model_data,
                      chains=2,
                    iter=1000,
                    cores=2,
                    backend="cmdstanr",
                      seed=random_seed)

  saveRDS(pew_fit_lm,"pew_fit_lm.rds")
} else {
  pew_fit_lm <- readRDS("pew_fit_lm.rds")
}

yrep_ols <- posterior_epred(pew_fit_lm)[,sample_all]
out_lm <- as.matrix(pew_fit_lm)[,-c(1,78,79,80)]
out_lm <- out_lm[,just_vars]
# save_names <- colnames(out_lm)
# # need to convert to real
# 
# out_lm <- apply(out_lm, 1, function(c) {
# 
#   x_inv %*% c
#   
# })
# out_lm <- t(out_lm)
# colnames(out_lm) <- save_names
# out_lm <- out_lm[,just_vars]

```

```{r fitzoib,include=F}

# fit the ZOIB
  
  if(run_model) {
    zoib_fit <- zoib_model$sample(data=list(n=nrow(x),
                                            y=model_data$therm,
                                            k=ncol(x),
                                            s=length(sample_all),
                                            sample_all=sample_all,
                                            x=x,
                                            seed=random_seed,
                                          run_gen=1),
                       parallel_chains=2,
                       cores=2,iter_warmup=500,
                       iter_sampling = 500,
                       refresh=0)
                       
    
    zoib_fit$save_object("zoib_fit.rds")
  } else {
    zoib_fit <- readRDS("zoib_fit.rds")
  }
  
  yrep_zoib <- zoib_fit$draws("zoib_epred") %>% as_draws_matrix()
  coef_a <- zoib_fit$draws("coef_a") %>% as_draws_matrix()
  coef_g <- zoib_fit$draws("coef_g") %>% as_draws_matrix()
  alpha <- zoib_fit$draws("alpha") %>% as_draws_matrix()
  X_beta_zoib <- zoib_fit$draws("coef_m") %>% as_draws_matrix()
  
  mat_data <- x_orig
  
  coef_a <- apply(coef_a, 1, function(c) {
        x_inv %*% c
        })
  coef_a <- t(coef_a)
  
  coef_g <- apply(coef_g, 1, function(c) {
        x_inv %*% c
        })
  coef_g <- t(coef_g)
  
  X_beta_zoib <- apply(X_beta_zoib, 1, function(c) {
        x_inv %*% c
        })
  X_beta_zoib <- t(X_beta_zoib)

all_vars_zoib <- lapply(just_vars,function(c,coef_a,coef_m,coef_g,alpha) {

  if(all(mat_data[!is.na(mat_data[,c]),c] %in% c(0,1))) {
    pred_data_high <- mat_data

    pred_data_high[,c] <- 0

    pred_data_low <- mat_data

    pred_data_low[,c] <- 1
  } else {
    pred_data_high <- mat_data

    pred_data_high[,c] <- pred_data_high[,c] + setstep(pred_data_high[,c])

    pred_data_low <- mat_data

    pred_data_low[,c] <- pred_data_low[,c] - setstep(pred_data_low[,c])
  }

  margin_ord <- sapply(1:nrow(X_beta_zoib), function(i,this_col) {

    y0 <- predict_zoib(X=pred_data_low,
                          coef_m=as.numeric(coef_m[i,]),
                          coef_a=as.numeric(coef_a[i,]),
                          coef_g=as.numeric(coef_g[i,]),
                          alpha1=alpha[i,1],
                          alpha2=alpha[i,2],
                          alpha3=alpha[i,3])

    y1 <- predict_zoib(X=pred_data_high,
                          coef_m=as.numeric(coef_m[i,]),
                          coef_a=as.numeric(coef_a[i,]),
                          coef_g=as.numeric(coef_g[i,]),
                          alpha1=alpha[i,1],
                          alpha2=alpha[i,2],
                          alpha3=alpha[i,3])

    marg_eff <- (y1-y0)/(pred_data_high[,this_col]-pred_data_low[,this_col])

    mean(marg_eff)
  },c)

  tibble(marg=margin_ord,variable=colnames(mat_data)[c])
},coef_m=X_beta_zoib,
coef_a,coef_g,
alpha) %>% bind_rows
  
```

```{r fitBetatrans,include=F}
  
# fit Beta - transformed
  
  
  if(run_model) {
    beta_trans_fit <- brm(bf(update(model_spec,therm_rescale~.),decomp="QR"),
                                 data=model_data,
                          chains=2,cores=2,iter=1000,
                          backend="cmdstanr",family="beta",
                                 seed=random_seed)
    
    saveRDS(beta_trans_fit,"beta_trans_fit.rds")
  } else {
    # rstanarm not working with reloaded models
    beta_trans_fit <- readRDS("beta_trans_fit.rds")
  }
  
  
  yrep_Beta_trans <- posterior_epred(beta_trans_fit)[,sample_all]
  
  # drop phi
  X_beta_trans <- as.matrix(beta_trans_fit)
  
  X_beta_trans <- X_beta_trans[,1:77]
  
#   X_beta_trans_new <- apply(X_beta_trans[,2:ncol(X_beta_trans)], 1, function(c) {
#         x_inv %*% c
#         })
#   X_beta_trans <- cbind(X_beta_trans[,1],t(X_beta_trans_new))
#   
#   
#   
# # marginal effects
#   
  
mat_data <- cbind(rep(1, nrow(x_orig)),x_orig)

# skip intercept

all_vars_Beta_trans <- lapply(just_vars+1,function(c,X_beta=NULL) {

  if(all(mat_data[!is.na(mat_data[,c]),c] %in% c(0,1))) {
    pred_data_high <- mat_data

    pred_data_high[,c] <- 0

    pred_data_low <- mat_data

    pred_data_low[,c] <- 1
  } else {
    pred_data_high <- mat_data

    pred_data_high[,c] <- pred_data_high[,c] + setstep(pred_data_high[,c])

    pred_data_low <- mat_data

    pred_data_low[,c] <- pred_data_low[,c] - setstep(pred_data_low[,c])
  }



  margin <- sapply(1:nrow(X_beta), function(i,this_col) {
    y0 <- predict_beta(X=pred_data_low,
                          X_beta=X_beta[i,])

    y1 <- predict_beta(X=pred_data_high,
                          X_beta=X_beta[i,])

    marg_eff <- (y1-y0)/(pred_data_high[,this_col]-pred_data_low[,this_col])

    mean(marg_eff)
  },c)

  tibble(marg=margin,variable=colnames(mat_data)[c])
},X_beta=X_beta_trans) %>% bind_rows

```

```{r fitfraclogit,include=F}


 if(run_model) {
    frac_fit <- frac_logit$sample(data=list(n=nrow(x),
                                            y=model_data$therm,
                                            k=ncol(x),
                                            x=x,
                                            seed=random_seed,
                                          run_gen=1),
                       parallel_chains=2,
                       cores=2,iter_warmup=500,
                       iter_sampling = 500,
                       refresh=0)
                       
    
    frac_fit$save_object("frac_fit.rds")
  } else {
    frac_fit <- readRDS("frac_fit.rds")
  }

   yrep_frac <- as_draws_matrix(frac_fit$draws("frac_rep")[,,sample_all])
   X_beta_frac <- as_draws_matrix(frac_fit$draws(variables="X_beta"))
   frac_int <- as_draws_matrix(frac_fit$draws(variables="alpha"))
   
   mat_data <- x_orig
   
   frac_int <- as_draws_matrix(frac_fit$draws(variables="alpha"))
   
   X_beta_frac <- apply(X_beta_frac, 1, function(c) {
        x_inv %*% c
        })
   X_beta_frac <- t(X_beta_frac)


all_vars_frac <- lapply(just_vars,function(c,X_beta_frac,frac_int,mat_data) {

  if(all(mat_data[!is.na(mat_data[,c]),c] %in% c(0,1))) {
    pred_data_high <- mat_data

    pred_data_high[,c] <- 0

    pred_data_low <- mat_data

    pred_data_low[,c] <- 1
  } else {
    pred_data_high <- mat_data

    pred_data_high[,c] <- pred_data_high[,c] + setstep(pred_data_high[,c])

    pred_data_low <- mat_data

    pred_data_low[,c] <- pred_data_low[,c] - setstep(pred_data_low[,c])
  }



    margin_frac <- sapply(1:nrow(X_beta_frac), function(i) {
    y0 <- plogis(c(frac_int[i,]) + pred_data_low %*% X_beta_frac[i,,drop=T])
    y1 <- plogis(c(frac_int[i,]) + pred_data_high %*% X_beta_frac[i,,drop=T])

    marg_eff <- (y1-y0)/(pred_data_high[,c]-pred_data_low[,c])

    mean(marg_eff)
  })

  tibble(marg=margin_frac,variable=colnames(mat_data)[c])
},X_beta_frac,
frac_int,mat_data) %>% bind_rows

```

```{r loo}

# compare all loos
# will  need to do manually

loo_ord <- fit_pew$loo("ord_log")
loo_zoib <- zoib_fit$loo("zoib_log")
loo_lm <- loo(log_lik(pew_fit_lm)[,sample_all])
loo_frac <- loo(frac_fit$draws("frac_log")[,,sample_all])
loo_trans <- loo(log_lik(beta_trans_fit)[,sample_all])
```

I fit each of the models in the simulation to the empirical data. Unfortunately, as I do not know the "true" sample average marginal effects, I cannot calculate M-error or S-error rates, nor can I directly compare the variance of the estimates to each other. However, I do calculate RMSE, $\hat{elpd}_{psis_loo}$, total marginal effect variance and the effective number of parameters, which is shown in Figure \@ref(fig:modcompare). It should be noted that the QR decomposition affected the number of parameters calculation, deflating the total number, especially for the ZOIB. Nonetheless, the ordered beta regression model is still noticeably less complex, and estimates marginal effects with remarkably higher precision. Only the Beta regression model on transformed values has higher precision, and as I explain later, this is most likely a sign of model mis-fit rather than an actual increase in information.

```{r modcompare, fig.cap="Comparison of Model Diagnostics for Replication of Aidt and Jensen (2012)",fig.width=5,fig.height=6}

# need to reduce iterations in zoib model

rmse_ord <- apply(yrep_ord,1,
                  function(c) sqrt(mean((c-c(to_bl$outcome_degen[sample_degen],
                                                to_bl$outcome_prop[sample_prop]))^2)))

rmse_ols <- apply(yrep_ols,1,
                  function(c) sqrt(mean(((c-model_data$therm[sample_all])^2))))

rmse_zoib <- apply(yrep_zoib,1,function(c) sqrt(mean((c-model_data$therm[sample_all])^2) ))

rmse_Beta_trans <- apply(yrep_Beta_trans,1,function(c) sqrt(mean((c-model_data$therm[sample_all])^2) ))

rmse_frac <- apply(yrep_frac,1,function(c) sqrt(mean((c-model_data$therm[sample_all])^2) ))

# kurt_ord <- apply(as_draws_matrix(fit_pew$draws("regen_all")),1,moments::kurtosis)
# 
# 
# kurt_ols <- apply(as.matrix(posterior_predict(pew_fit_lm)),1,moments::kurtosis)
# 
# kurt_zoib <- apply(as_draws_matrix(zoib_fit$draws("zoib_regen")),1,moments::kurtosis)
# 
# kurt_Beta_trans <- apply(as.matrix(posterior_predict(beta_trans_fit)),1,moments::kurtosis)
# 
# kurt_frac <- apply(yrep_frac,1,moments::kurtosis)

# do rmse + kurtosis

if(length(rmse_Beta_trans)<1000) {
  rmse_Beta_trans <- c(rmse_Beta_trans,rmse_Beta_trans)
}

applied_est <- tibble(Estimate=c(rmse_ord[1:1000],rmse_zoib[1:1000],rmse_Beta_trans,rmse_frac[1:1000],rmse_ols)*100,
       Model=rep(c("Ordered\nBeta","ZOIB","Transformed\nBeta","Fractional\nLogit","OLS"),
                 each=1000),
       Statistic="RMSE") %>% 
  group_by(Model,Statistic) %>% 
  summarize(med_est=median(Estimate),
            low=quantile(Estimate,.05),
            high=quantile(Estimate,.95))

applied_est <- bind_rows(list(OLS=gather(as_tibble(out_lm),key="variable",value="marg"),
                         `Ordered\nBeta`=all_vars_ord,
               `Transformed\nBeta`=all_vars_Beta_trans,
               `Fractional\nLogit`=all_vars_frac,
               `ZOIB`=all_vars_zoib),.id="Model") %>% 
  group_by(Model,variable) %>% 
  mutate(Draws=1:n(),
         Statistic="Coefficient Variance\n(% Change Vs. Ordered Beta)") %>% 
  group_by(Model,Statistic,variable) %>% 
  summarize(Estimate=var(marg)) %>% 
  group_by(Model,Statistic) %>% 
  summarize(Estimate=sum(Estimate)) %>% 
  group_by(Model,Statistic) %>% 
  summarize(med_est=median(Estimate)) %>% 
  ungroup %>% 
  mutate(med_est=((med_est - med_est[Model=="Ordered\nBeta"])/med_est[Model=="Ordered\nBeta"])*100) %>% 
  bind_rows(applied_est)

loo_tib  <- tibble(Model=c("Ordered\nBeta","ZOIB","Fractional\nLogit","OLS"),
                   Statistic="ELPD-LOO",
                  med_est=c(loo_ord$estimates["elpd_loo","Estimate"],
                            loo_zoib$estimates["elpd_loo","Estimate"],
                            loo_frac$estimates["elpd_loo","Estimate"],
                            loo_lm$estimates["elpd_loo","Estimate"]),
                  low=c(loo_ord$estimates["elpd_loo","Estimate"] - loo_ord$estimates["elpd_loo","SE"]*1.96,
                            loo_zoib$estimates["elpd_loo","Estimate"] - loo_zoib$estimates["elpd_loo","SE"]*1.96,
                            loo_frac$estimates["elpd_loo","Estimate"] - loo_frac$estimates["elpd_loo","SE"]*1.96,
                            loo_lm$estimates["elpd_loo","Estimate"] - loo_lm$estimates["elpd_loo","SE"]*1.96),
                  high=c(loo_ord$estimates["elpd_loo","Estimate"] + loo_ord$estimates["elpd_loo","SE"]*1.96,
                            loo_zoib$estimates["elpd_loo","Estimate"] + loo_zoib$estimates["elpd_loo","SE"]*1.96,
                            loo_frac$estimates["elpd_loo","Estimate"] + loo_frac$estimates["elpd_loo","SE"]*1.96,
                            loo_lm$estimates["elpd_loo","Estimate"] + loo_lm$estimates["elpd_loo","SE"]*1.96))

loo_p <- tibble(Model=c("Ordered\nBeta","ZOIB",
                        "Fractional\nLogit","Transformed\nBeta","OLS"),
                   Statistic="No. Parameters",
                  med_est=c(loo_ord$estimates["p_loo","Estimate"],
                            loo_zoib$estimates["p_loo","Estimate"],
                            loo_frac$estimates["p_loo","Estimate"],
                            loo_trans$estimates["p_loo","Estimate"],
                            loo_lm$estimates["p_loo","Estimate"]),
                low=c(loo_ord$estimates["p_loo","Estimate"] - loo_ord$estimates["p_loo","SE"]*1.96,
                            loo_zoib$estimates["p_loo","Estimate"] - loo_zoib$estimates["p_loo","SE"]*1.96,
                            loo_frac$estimates["p_loo","Estimate"] - loo_frac$estimates["p_loo","SE"]*1.96,
                      loo_trans$estimates["p_loo","Estimate"] - loo_trans$estimates["p_loo","SE"]*1.96,
                            loo_lm$estimates["p_loo","Estimate"] - loo_lm$estimates["p_loo","SE"]*1.96),
                  high=c(loo_ord$estimates["p_loo","Estimate"] + loo_ord$estimates["p_loo","SE"]*1.96,
                            loo_zoib$estimates["p_loo","Estimate"] + loo_zoib$estimates["p_loo","SE"]*1.96,
                            loo_frac$estimates["p_loo","Estimate"] + loo_frac$estimates["p_loo","SE"]*1.96,
                         loo_trans$estimates["p_loo","Estimate"] + loo_trans$estimates["p_loo","SE"]*1.96,
                            loo_lm$estimates["p_loo","Estimate"] + loo_lm$estimates["p_loo","SE"]*1.96))


  bind_rows(applied_est,loo_tib,loo_p) %>% 
    mutate(Model=factor(Model),
           Model=fct_relevel(Model,"Ordered\nBeta",after = 5)) %>% 
    ggplot(aes(y=med_est,x=Model)) +
    theme_tufte() +
    theme(panel.border = element_rect(fill=NA)) +
  geom_pointrange(aes(ymin=low,
                     ymax=high),size=.4,alpha=0.5) +
  coord_flip() +
  geom_text(aes(label=signif(med_est,3)),vjust="bottom",hjust="inward",
            nudge_x=.1,size=2.7) +
  facet_wrap(~Statistic,scales="free_x",nrow=3) + 
  ylab("") +
  xlab("") +
  labs(caption="Uncertainty intervals are the empirical 5% to 95% posterior quantiles.")

```

Interestingly, OLS has superior performance over other models in terms of $\hat{elpd}_{psis_loo}$, and RMSE, which is likely due to the prevalence of continuous values (in the (0,1) interval) in the outcome. As the proportion of degenerate responses increases, OLS will perform less well on these metrics. The fact that OLS can perform well while still being the "wrong" model shows the limitation in solely focusing on predictive validity.

On the other hand, we see that ordered Beta regression does quite well in RMSE and $\hat{elpd}_{psis_loo}$, even if not as well as OLS, while also having noticeably lower effective number of parameters and coefficient (marginal effect) variance. Furthermore, we know from the simulation that this superior performance will hold across very different distributions with more or less degenerate versus continuous responses.

It should be noted here and later that the fractional logit under-performed. This is most likely due to the prevalence of continuous responses; fractional logit is derived from the binomial distribution and so it will usually perform better when the responses are at the extremes rather than in the middle of the distribution. The peculiarly low effective number of parameters for fractional logit and the transformed Beta regression, combined with the models' higher variance and poor predictive performance, suggests that these ad hoc approaches do not compare with Beta regression variants that directly model the unique features of bounded distributions.

Figure \@ref(fig:postpred) shows draws of the posterior predictive distribution in gray with the empirical (true) distribution in blue. OLS is the only one of the models considered that predicts outside of [0,1]. In general, it would seem that OLS, ZOIB and ordered beta regression have the best overall fit to the distribution (so long as OLS' out-of-sample predictions are ignored). OLS fits the lower continuous mode better while ZOIB and ordered beta fit the higher mode more closely. The fractional logit tends to over-emphasize the degenerate responses and under-emphasize the continuous responses, while the transformed Beta plot reveals that the transformation does induce serious model misfit.

```{r postpred,fig.cap="Comparison of Sample to Posterior Empirical Cumulative Distribution for Models"}

sample_rows1 <- sample(1:nrow(yrep_ord),100)
sample_rows2 <- sample(1:nrow(yrep_ols),100)
# first all vals

lm_dens <- ppc_dens_overlay(pew_fit_lm$data$therm[sample_all],as.matrix(posterior_predict(pew_fit_lm))[sample_rows2,sample_all]) +
  ggtitle("OLS") + 
  ylab("Empirical Density")
ord_dens <- ppc_dens_overlay(c(to_bl$outcome_degen[sample_degen],to_bl$outcome_prop[sample_prop]),as_draws_matrix(fit_pew$draws("regen_all"))[sample_rows1,]) +
  ggtitle("Ordered Beta")
zoib_dens <- ppc_dens_overlay(model_data$therm[sample_all],as_draws_matrix(zoib_fit$draws("zoib_regen"))[sample_rows1,]) +
  ggtitle("ZOIB")
frac_dens <- ppc_dens_overlay(as.numeric(model_data$therm[sample_all]),yrep_frac[sample_rows1,]) +
  ggtitle("Fractional")
Beta_trans_dens <- ppc_dens_overlay(as.numeric(model_data$therm[sample_all]),as.matrix(posterior_predict(beta_trans_fit,))[sample(1:nrow(yrep_Beta_trans),100),sample_all]) +
    ggtitle("Transformed\nBeta") + 
    ylab("Empirical Density")
lm_dens + ord_dens + zoib_dens + Beta_trans_dens + frac_dens +
  plot_layout(guides = 'collect') +
  plot_annotation(caption=TeX("Value of $y$"))  &
                  theme(plot.caption = element_text(hjust=0.5),
                              axis.text.x=element_text(size=7))
```

I next compare estimated sample average marginal effects from each model in Figure \@ref(fig:combinecoef). For some covariates, such as Revolution (whether a neighboring country experienced a democratic revolution), the estimates are of the same sign, though not necessarily the same magnitude. In other cases, such as $\text{Population}_{t-1}$, OLS and the ZOIB/ordered beta regression estimates have opposite signs, and for $\text{GDP}_{t-1}$, ordered beta regression shows a strongly positive effect while the ZOIB shows an estimated effect of zero. However, generally speaking the two models have coefficients that are of the same sign, although the magnitude can vary significantly.

The heterogeneity in the transformed Beta regression estimates is another worrying sign that nudging the degenerate responses has a very strong implication for inference. The marginal effects for this model vary quite significantly from ordered Beta regression and the ZOIB even though these latter two specifications also incorporate the Beta distribution. Because the data transformation depends on $N$, the minimum allowed value for the outcome in this case is 0.0000082, while the maximum value is 0.9999918. As such, this seemingly minute transformation will have strong consequences on the estimation of the Beta regression as it takes into account these very unlikely observations. The fact that the variance is less than half that of ordered beta regression is another concerning sign that the data transformation may also lead to an inflation of certainty in the results.

```{r combinecoef,fig.cap="Estimated Marginal Effects of Aidt and Jensen (2012) Covariates on Suffrage Index (0 - 100)",fig.width=6,fig.height=8.5}

recode_marg <- bind_rows(list(OLS=gather(as_tibble(out_lm),key="variable",value="marg"),
                         `Ordered\nBeta`=all_vars_ord,
               `Beta\nTransformed`=all_vars_Beta_trans,
               `Fractional\nLogit`=all_vars_frac,
               `ZOIB`=all_vars_zoib),.id="model") %>% 
  filter(!(variable %in% c("R2","sigma","log-fit_ratio","(Intercept)","lp__","Intercept","b_Intercept"))) %>% 
  mutate(variable=fct_recode(variable,
  "Gold Standard" = "b_Gold_standard",
  "Suffrage_t-1" = "b_lage2c",
  "Enrollment_t-1" = "b_lagenrollment",
  "GDP_t-1" = "b_laggdp",
  "Pop_t-1" = "b_lagn",
  "Urban_t-1" = "b_lagurban",
  "Revolution" = "b_revolution4b",
  "War" = "b_war",
  "WWI" = "b_ww1",
  "Gold Standard" = "Gold_standard",
  "Suffrage_t-1" = "lage2c",
  "Enrollment_t-1" = "lagenrollment",
  "GDP_t-1" = "laggdp",
  "Pop_t-1" = "lagn",
  "Urban_t-1" = "lagurban",
  "Revolution" = "revolution4b",
  "War" = "war",
  "WWI" = "ww1"
))
  recode_marg %>% 
    group_by(model,variable) %>% 
    mutate(marg=marg*100) %>% 
  summarize(med_est=median(marg),
            high=quantile(marg,.95),
            low=quantile(marg,.05)) %>% 
    # filter(model %in% c("Ordered\nBeta","ZOIB","OLS","Fractional\nLogit")) %>% 
  ggplot(aes(y=med_est,x=model)) +
  geom_pointrange(aes(ymin=low,ymax=high,colour=model,shape=model),position=position_dodge(width=.5)) +
  scale_color_viridis_d() +
  theme_minimal() +
  geom_hline(yintercept = 0,linetype=2) +
  ylab("Marginal Effects on Suffrage (0 to 100)") +
  theme(legend.position = "top",axis.text.y=element_text(size=9),
        panel.background = element_rect(fill=NA)) +
  guides(color="none",shape="none") +
  facet_wrap(~variable,
             scales="free",ncol=2) +
  xlab("") +
    labs("Uncertainty  intervals are the 5% to 95% empirical posterior quantiles.") +
  coord_flip() 

```

While it can be difficult to identify differences in uncertainty intervals visually, Figure \@ref(fig:modcompare) shows that the total variance in estimated marginal effects is much lower for ordered beta regression than for the ZOIB and OLS. While in this particular estimation the amount of uncertainty did not impede inference across models due to the size of the data, the ability of ordered beta regression to capture both the peculiar features of the distribution and precise marginal effects over the entire distribution is a clear advantage. As was shown in Figure \@ref(fig:lookurt), ordered beta regression is likely to require a smaller sample size to detect an effect relative to alternatives.

<!--# These comparisons can help us understand the divergent marginal effects reported in Figure @ref(fig:combinecoef). Beta regression with only continuous values shows no differentiation among education categories, while for income categories it shows an opposite relationship, with richer respondents holding stronger views for college professors. OLS, which also downplays values at the extremes, is close to the continuous Beta distribution in its interpretation of these effects. The transformed Beta regression is close to the continuous distributions for education, and quite confusingly, close to the ordered Beta regression for income (as has probably become clear by this point, this transformation has potentially severe consequences on correctly modeling the response).  -->

```{r extreme,include=F,eval=F}

# look at extreme outcomes

model_data <- mutate(model_data,extreme=as.numeric(therm %in% c(0,1)))

model_data %>% select(GDP="laggdp",therm) %>%
  mutate(therm_type=case_when(therm==0~ "0",
                              therm>0 & therm<1~"(0,1)",
                              therm==1~"1")) %>% 
    ggplot(aes(y=therm,x=GDP)) +
  geom_point() +
  facet_wrap(~therm_type)
  

```

<!--# One final note on the bottom row of Figure @ref(fig:zoibcoef) are the fixed probabilities for $Pr(y_i>0 \cap Pr(y_i<1))$. These fixed values are due to the fact that the cutpoints are fixed for the entire sample. It is possible to allow the cutpoints to vary across groups, or even to parameterize the cutpoints with covariates to permit more subtle gradations. However, as the analysis with the ZOIB model shows, it is important to know theoretically why this flexibility is required, as it will affect the interpretation of the underlying estimates. The employment of cutpoints has the advantage that any additional flexibility can be incorporated as-needed rather than allowing for completely independent processes as with the ZOIB.  -->

```{r zoibcoef,fig.cap="Comparison of Intermediate Probabilities for Ordered Beta Regression and ZOIB",fig.height=8.5,fig.width=6}

# facet_labels <- c(TeX("$Pr(y=0)$",output="character"),
#                       TeX("$Pr(y=1)$",output="character"),
#                       TeX("$Pr(y>0 \\bigcap y<1)$",output="character"),
#                       TeX("Beta($y$)",output="character"))

facet_labels <- c("Pr(y=0)",
                      "Pr(y=1)",
                      "Pr(y>0) \u2229 Pr(y<1)",
                      "Beta(y)")

# need to check all the zoib coefs

# drop intercept

mat_data <- x_orig

zoib_all_prob <- lapply(just_vars,function(c,coef_a,coef_m,coef_g,alpha) {
  
  if(all(mat_data[!is.na(mat_data[,c]),c] %in% c(0,1))) {
    pred_data_high <- mat_data
    
    pred_data_high[,c] <- 0
    
    pred_data_low <- mat_data
    
    pred_data_low[,c] <- 1
  } else {
    pred_data_high <- mat_data
    
    pred_data_high[,c] <- pred_data_high[,c] + setstep(pred_data_high[,c])
    
    pred_data_low <- mat_data
    
    pred_data_low[,c] <- pred_data_low[,c] - setstep(pred_data_low[,c])
  }
  
  
  
  margin_ord <- lapply(1:nrow(X_beta_ord), function(i,this_col) {
    y0 <- predict_zoib(X=pred_data_low,
                          coef_m=as.numeric(coef_m[i,]),
                          coef_a=as.numeric(coef_a[i,]),
                          coef_g=as.numeric(coef_g[i,]),
                          alpha1=alpha[i,1],
                          alpha2=alpha[i,2],
                          alpha3=alpha[i,3],
                       combined_out = F)
    
    y1 <- predict_zoib(X=pred_data_high,
                          coef_m=as.numeric(coef_m[i,]),
                          coef_a=as.numeric(coef_a[i,]),
                          coef_g=as.numeric(coef_g[i,]),
                          alpha1=alpha[i,1],
                          alpha2=alpha[i,2],
                          alpha3=alpha[i,3],
                       combined_out = F)
    
    marg_eff_0 <- (y1$pr_zero-y0$pr_zero)/(pred_data_high[,this_col]-pred_data_low[,this_col])
    marg_eff_1 <- (y1$pr_one-y0$pr_one)/(pred_data_high[,this_col]-pred_data_low[,this_col])
    marg_middle <- (y1$pr_proportion-y0$pr_proportion)/(pred_data_high[,this_col]-pred_data_low[,this_col])
    marg_eta <- (plogis(y1$proportion_value)-plogis(y0$proportion_value))/(pred_data_high[,this_col]-pred_data_low[,this_col])

    tibble(marg=c(mean(marg_eff_0),
             mean(marg_eff_1),
             mean(marg_middle),
             mean(marg_eta)),
             estimate=facet_labels)
  },c) %>% bind_rows
  

  margin_ord$variable <- colnames(mat_data)[c]
  
  margin_ord
  
},coef_m=X_beta_zoib,coef_a=coef_a,coef_g=coef_g,alpha=alpha) %>% bind_rows

# repeat for regular ordered regression

mat_data <- rbind(x_orig[model_data$therm_type=="outside",],
                  x_orig[model_data$therm_type=="inside",])

ord_reg_all_prob <- lapply(just_vars,function(c) {
  
  if(all(mat_data[!is.na(mat_data[,c]),c] %in% c(0,1))) {
    pred_data_high <- mat_data
    
    pred_data_high[,c] <- 0
    
    pred_data_low <- mat_data
    
    pred_data_low[,c] <- 1
  } else {
    pred_data_high <- mat_data
    
    pred_data_high[,c] <- pred_data_high[,c] + setstep(pred_data_high[,c])
    
    pred_data_low <- mat_data
    
    pred_data_low[,c] <- pred_data_low[,c] - setstep(pred_data_low[,c])
  }
  
  
  margin_ord <- lapply(1:nrow(X_beta_ord), function(i,this_col) {

    y0 <- predict_ordbeta(cutpoints=cutpoints_est[i,],
                          X=pred_data_low,
                          X_beta=t(X_beta_ord[i,,drop=F]),
                          alpha=as.numeric(alpha_ord)[i],
                          combined_out = F)

    y1 <- predict_ordbeta(cutpoints=cutpoints_est[i,],
                          X=pred_data_high,
                          alpha=as.numeric(alpha_ord)[i],
                          X_beta=t(X_beta_ord[i,,drop=F]),
                          combined_out = F)
    
    marg_eff_0 <- (y1$pr_zero-y0$pr_zero)/(pred_data_high[,this_col]-pred_data_low[,this_col])
    marg_eff_1 <- (y1$pr_one-y0$pr_one)/(pred_data_high[,this_col]-pred_data_low[,this_col])
    marg_middle <- (y1$pr_proportion-y0$pr_proportion)/(pred_data_high[,this_col]-pred_data_low[,this_col])
    marg_eta <- (y1$proportion_value-y0$proportion_value)/(pred_data_high[,this_col]-pred_data_low[,this_col])

    tibble(marg=c(mean(marg_eff_0),
             mean(marg_eff_1),
             mean(marg_middle),
             mean(marg_eta)),
             estimate=facet_labels)
  },c) %>% bind_rows()
  
  margin_ord$variable <- colnames(mat_data)[c]
  
  margin_ord
  
}) %>% bind_rows

compare_prob <-  bind_rows(list(`ZOIB`=zoib_all_prob,
                `Ordered Beta`=ord_reg_all_prob),.id="model") %>% 
    group_by(model,variable,estimate) %>% 
  summarize(med_est=median(marg*100),
            high=quantile(marg*100,.95),
            low=quantile(marg*100,.05)) 

compare_prob %>% 
  mutate(estimate=factor(estimate,levels=c("Beta(y)",
                                           "Pr(y>0) \u2229 Pr(y<1)",
                      "Pr(y=1)","Pr(y=0)")),
         variable=fct_recode(variable,
  "Gold Standard" = "Gold_standard",
  "Suffrage_t-1" = "lage2c",
  "Enrollment_t-1" = "lagenrollment",
  "GDP_t-1" = "laggdp",
  "Population_t-1" = "lagn",
  "Urban_t-1" = "lagurban",
  "Revolution" = "revolution4b",
  "War" = "war",
  "WWI" = "ww1"
)) %>% 
  filter(variable!="Suffrage_t-1") %>% 
  ggplot(aes(y=med_est,x=variable)) +
  geom_pointrange(aes(ymin=low,ymax=high,colour=model,shape=model),position=position_dodge(width=.5)) +
  scale_color_viridis_d() +
  theme_tufte() +
  geom_hline(yintercept = 0,linetype=2) +
  ylab("Suffrage Index") +
  theme(legend.position = "top",
        panel.border = element_rect(fill=NA)) +
  guides(color=guide_legend(title=""),shape=guide_legend(title="")) +
  facet_wrap(~fct_rev(estimate),scales="free_x",ncol=2) +
  xlab("") +
  coord_flip()

```

Finally, Figure \@ref(fig:zoibcoef) disaggregates the estimated effects for ordered beta and the ZOIB to make it clear why the models in some cases diverge. To do so, I calculate the marginal effect of covariates on the probability of the three possible outcome types separately (i.e., $Pr(y=1)$, $Pr(y=0)$ and $Pr(y>0) \cap Pr(y<1)$). I also include the marginal effect of the covariate on the Beta density of $y$.

First, it should be noted that the disparities do not arise from the underlying Beta regression. In all cases, the estimated marginal effects from the Beta regression are similar for both the ZOIB and ordered beta. Rather, the dissimilarities arise from the effect of the covariates on the probability of degenerate versus continuous responses. If a covariate has a positive effect in the ordered beta regression model, then by necessity, $Pr(y=0)$ will decrease and $Pr(y=1)$ and $Pr(y>0) \cap Pr(y<1)$ will increase (though not necessarily in equal proportions). This pattern can be seen for the ordered beta regression results: though the marginal effects for $Pr(y=1)$ are smaller due to its rarity as an outcome, they are always the opposite sign of the $Pr(y=0)$ effects.

By contrast, for the ZOIB, marginal effects can have independent influences on these three probabilities. This pattern can be seen for the $\text{GDP}_{t-1}$ covariate. The ZOIB estimates that rising GDP will increase the probability of zero suffrage ($Pr(y=0)$), increase the probability of full suffrage ($Pr(y=1)$), while simultaneously reducing the probability of intermediate suffrage ($Pr(y>0) \cap Pr(y<1)$) and increasing the value of intermediate suffrage conditional on intermediate suffrage being reached ($\text{Beta}(y)$). The ordered beta regression model, by contrast, estimates that rising GDP is associated with a lower chance of zero suffrage, rising probability of intermediate and high suffrage, as well as a higher value of intermediate suffrage. This latter set of associations seems to follow more logically from the definition of the outcome as a single scale of democratic inclusion.

It is of course possible that the ZOIB and ordered beta regression produce the same interpretation. The Revolution covariate, which has a similar aggregate effect in Figure \@ref(fig:combinecoef), likewise has similar disaggregated effects across outcome types in Figure \@ref(fig:zoibcoef), revealing that the two models can overlap if the ZOIB meets the ordered Beta model's assumptions. In summary, both the ZOIB and the ordered beta model treat the degenerate responses as qualitatively different, but the ZOIB goes as far as allowing them to be completely independent processes, which can have confusing implications for inference when the scale is a single construct. By allowing for limited heterogeneity in the bounded outcome, ordered beta regression can estimate a single marginal effect across the entire distribution, which corresponds more closely to the desired interpretation in this study and many others with a single measured construct.

# Discussion

<!-- The empirical and simulation-based analysis of the ordered Beta regression model show that it is an appealing alternative to existing approaches. The application of the model in this paper is considered primarily in terms of individuals answering survey scales of one kind or another. This aspect is important to evaluating the model, as it suggests that the ordered Beta regression's more parsimonious interpretation of the effect of covariates on the outcome is preferable to the more fully parameterized ZOIB. It is assumed that what the analyst desires is the effect of a covariate on the full distribution, which includes both degenerate and continuous responses. There is no particular reason to believe, however, that the effect should vary between the degenerate and continuous responses, or even between lower and upper degenerate responses. -->

Given these discrepancies between the ZOIB and the ordered beta regression, I consider in this section when the ZOIB should be employed. To do so, I return to the zero-or-one model addressed earlier [@ferrari2012]. While I did not consider this model because it does not produce a defined marginal effect over the full distribution, it may also be more widely applicable than the ZOIB. In the empirical example presented, a zero-or-one model could be used if the value of zero suffrage was treated as a different regime type (i.e., dictatorship) rather than a continuation of the latent measure. This parameterization would follow from the idea of a selection model in which cases must first select into continuous responses.

The ZOIB could then be understood as an even more specific case of the zero-or-one model in which selection could occur at both ends of the response. That is, units could begin in one category (0), proceed through a continuous process (0,1), and then end in a separate category (1). While this type of situation can certainly occur, it is a more specific process than selection at only one end of the scale as the zero-or-one model proposes.

As such, while the ZOIB and ordered beta regression can fit a [0,1] distribution accurately, and can even return similar marginal effects, they make quite different statements about how degenerate responses are produced. The ordered beta model considers all of the response to be generated by a single latent process, which permits more precise estimation and also a clear interpretation about what the underlying components of the parameters mean. If a scholar is looking to model a single bounded continuous scale, ordered beta regression should provide a reasonable fit to the data and meaningful covariate estimates.

<!--# In fact, due to the employment of weakly informative priors, it is even possible to fit the ordered Beta regression model to distributions without any degenerate responses. These estimations can still be useful when the analyst believes degenerate responses could have risen, but did not in the particular sample of data under analysis. This type of situation is likely to arise when the data come in repeated samples, such as approval rating polls aggregated over time.   -->

<!-- In the supplemental information I further show the results of a regression of the precision parameter in the ordered Beta regression. These results are intriguing as they show that wealthier and more educated respondents are more likely to hold much more heterogeneous views, while poorer, less educated and more conservative respondents tend to cluster together. The ability to better understand the higher-order moments of the distribution is a general reason why Beta-distributed models are superior to simpler approaches. -->

<!--# These results also show that predictive validity measures, especially $\hat{elpd}_{psis_loo}$ and RMSE, are only of limited value when evaluating models on a single sample because they are affected by  the modality of  the sample. Another assumption in this paper is that the primary objective is to learn about the effect of covariates on the outcome rather than predicting future responses. If that were the aim, it would be worthwhile to consider a broader range of models with considerable predictive power but limited explanatory value, including random forests and neural networks.  -->

# Conclusion

This paper presented a new model for bounded continuous distributions with considerable observations on the bounds. This paper builds on prior models in the literature incorporating the Beta distribution, which has admirable properties for evaluating the unique features of bounded data. The extension of Beta regression to cover degenerate responses at the bounds is made possible by employing cutpoints that permit the 0 and 1 values to be jointly estimated with the Beta distribution. Compared to existing approaches, particularly zero-and-one inflated Beta regression (ZOIB) and OLS, ordered Beta regression is able to capture a more precise and interpretable marginal effect of a given covariate on the outcome over the full response range.

# References
